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Abstract Data assimilation techniques, developed in the last two decades mainly for weather 
prediction, produce better forecasts by taking advantage of both theoretical/numerical mod- 
els and real-time observations. In this paper, we explore the possibility of applying the 
data-assimilation techniques known as 4D-VAR to the prediction of solar flares. We do so in 
the context of a continuous version of the classical cellular-automaton-based self-organized 
critical avalanche models of solar flares introduced by Lu and Hamilton (Astrophys. J., 
380, L89, 1991). Such models, although a priori far removed from the physics of magnetic 
reconnection and magnetohydrodynamical evolution of coronal structures, nonetheless re- 
produce quite well the observed statistical distribution of flare characteristics. We report 
here on a large set of data assimilation runs on synthetic energy release time series. Our 
results indicate that, despite the unpredictable (and unobservable) stochastic nature of the 
driving/triggering mechanism within the avalanche model, 4D-VAR succeeds in producing 
optimal initial conditions that reproduce adequately the time series of energy released by 
avalanches/flares. This is an essential first step towards forecasting real flares. 

Keywords: Sun: flares, data assimilation, self-organized criticality 
1. Introduction: Flares and Self-Organized Criticality 

Spatially-resolved observations of solar flares have revealed a very broad range of scales 
in the flaring phenomenon. Probability distributions of global flare characteristics such as 
peak flux, energy release, and duration are now known to take the form of power laws 
spanning many decades in size (eight in the case of flare energy; see, e.g., Dennis, 1985; Lu 
et al, 1993; Aschwanden et al., 2000). This is surprising because the vast majority of flares 
occur in active regions and activity complexes that have global characteristics (linear size, 
magnetic flux, peak field strength) that are much more narrowly distributed. This indicates 
that the flaring phenomenon is intrinsically scale-free, even though its energy reservoir may 
not be. The relatively-slow evolution of active regions is also in stark contrast to the short 
energy release timescale associated with the flaring phenomenon. 
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Avalanches are one class of physical phenomena that are characterized by intermittent, 
scale-free energy release even under conditions of slow, continuous energy loading. In the 
flare context, the physical picture usually invoked is that of coronal magnetic structures 
being slowly and stochastically forced by photospheric fluid motions, leading to the gradual 
buildup of electrical current sheets in the coronal plasma (Parker, 1983, 1988). Plasma in- 
stabilities eventually trigger magnetic reconnection at an unstable site, leading to alterations 
of the physical conditions at neighbouring current sheets that can then themselves become 
unstable, and so on in classical avalanching style. Numerous avalanche models inspired by 
this general scenario have been developed to describe solar flares (see Charbonneau et al, 
2001, and references therein). Many of these models manage to produce avalanche size 
distributions having the form of power laws, with logarithmic exponent comparing fairly 
well to observationally-determined values (e.g., Lu et al., 1993; Georgoulis and Vlahos, 
1998; Aschwanden and Charbonneau, 2002). Most of these models are based on the idea 
of self-organized criticality (SOC) (Bak, Tang, and Wiesenfeld, 1987; Jensen, 1998). The 
"criticality" is akin to phase change in equilibrium thermodynamics, where the effects of 
a small, localized perturbation can be felt on a dynamical timescale throughout the whole 
system. The system is said to be "self-organized" when this critical state is a dynamical 
attractor and is reached in response to external forcing without requiring fine tuning of 
a control parameter. Generally, SOC is found in slowly-driven, open, dissipative systems 
subjected to a self-limiting local threshold instability. The threshold to the instability is 
crucial, as it allows the system to transit from one metastable state to another while pre- 
venting the dynamics to be governed by external forcing (Jensen, 1998). Potential examples 
in the natural world include various forms of sandpiles, avalanches and landslides, but also 
earthquakes, forest fires, hydrological networks, traffic jams, magnetospheric substorms, 
and solar flares (see Bak, 1996, for a spirited exposition). 

If flares are truly a manifestation of SOC dynamics, then the outlook for accurate flare 
forecasting would appear, a priori, pretty grim indeed. Nothing fundamentally distinguishes 
a large flare from a small one, flare size simply being a matter of the number of current sheets 
involved in the avalanche of reconnection events. Even worse, the triggers of flares large and 
small are the same, namely a small (quite possibly unobservable) perturbation affecting the 
system somewhere locally. However, the occurrence of a large avalanche is only possible if 
a large, "connected" portion of the system is close to the avalanching threshold. The state of 
the system, in turn, is a function of its prior history, and in particular of the past occurrence 
of avalanches, of which the larger ones are (presumably) observable. In other words, past 
avalanching behaviour holds clues to the current state of the system, and therefore to its 
potential avalanching behaviour. 

The question is then: can this information be retrieved and used to produce reliable 
avalanche forecasts, despite the stochastic nature of the driving/triggering mechanism? This 
is the central question we address in this series of papers, using data-assimilation techniques. 
This first paper describes the SOC avalanche model and data assimilation technique we are 
developing towards forecasting, and demonstrates that the resulting scheme can adequately 
reproduce the avalanching behaviour of the system even in the absence of detailed informa- 
tion on the spatio-temporal behaviour of the stochastic driver. This is a first essential step 
towards forecasting, which is the topic of the subsequent papers in the series. 

This first paper is organized as follows: Section 2 gives an overview of a simple, "clas- 
sical", discrete SOC model based on a cellular automaton, as well as a continuous analog 
described by a partial differential equation reversed-engineered from the discrete cellular 
automaton rules. This is the model used in Section 3 in conjunction with the 4D-VAR data 
assimilation techniques. Section 4 presents results for a wide set of validation experiments 
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demonstrating that 4D-VAR can successfully reproduce the avalanching behaviour present 
in energy release time series, despite the stochastic nature of the external forcing that loads 
energy into the system and triggers avalanches. We conclude in Section 5 by summarizing 
the main results of this study, and outlining the road lying ahead, towards forecasting real 
solar flares. 



2. Avalanche Models 

2. 1 . The Lu and Hamilton Model 

The sandpile has now become the icon of SOC systems (Bak, Tang, and Wiesenfeld, 1987). 
As sand grains are dropped one by one on a flat surface, a sandpile will build up, with 
occasional avalanches of various sizes, until the pile has reached a conical shape with the 
slope everywhere at or near the angle of repose. Addition of more sand grains can now 
trigger large avalanches disrupting the whole slope, or the toppling of only a few sand grains, 
or nothing at all. The system has reached a statistically stationary state where, averaged over 
a long-enough temporal interval, as many sand grains fall off the pile as are dropped on it. 
Notice that while the loading is slow and gradual, the unloading is strongly intermittent and 
involves avalanches of all sizes. 

The statistical physics of sandpiles has been extensively studied using cellular automata 
models, where the sandpile is replaced by a lattice of locally interconnected nodes on which 
a nodal variable related to energy is defined (Kadanoff et al, 1989). In the context of solar 
flares, the first such model is to be found in the groundbreaking work of Lu and Hamil- 
ton (1991) and Lu et al. (1993) (but see also Zirker and Cleveland, 1993). Consider the 
following two-dimensional scalar version of the Lu and Hamilton (1991) model; a scalar 
nodal quantity {A" j) is defined over a N x N regular cartesian grid with nearest-neighbour 
connectivity (top-ndown-i-right-i-left; see, e.g.. Figure 1 in Charbonneau et al, 2001). Here 
the superscript n is a discrete time index, and the subscript pair (/, j) identifies a node on the 
2D lattice. The cellular automaton is driven by adding small increments in A at randomly 
selected nodes in the lattice (one per time step), analogous here to dropping sand grains on 
the pile. A stability criterion is defined in terms of the local curvature of the field at node 

iijy- 

^'lj=^'Lj~'^ 52 '^neighbours I W 
neighbours 

where the sum runs over the four nearest neighbours at nodes (!,y'± 1) and (i ± 1,7). If this 
quantity exceeds some pre-set threshold (Ac) (analogous to the slope exceeding the angle of 
repose on the sandpile), then A" j is redistributed to its four nearest neighbours according to 
the following rules: 

A'lf^A'lj-^AA^j, (2) 

These redistribution rules are conservative in A; however, if one identifies A~ with a measure 
of energy (more on this below), then it is readily verified that they lead to a decrease of the 
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total lattice energy: 



(4) 



the excess energy being what is liberated in the "flare". Lu and Hamilton (1991) and Lu 
et al. (1993) have shown that this driven cellular automaton can produce avalanches with 
robust power-law exponents resembling those inferred from flare observations (see also 
Charbonneau et al., 2001, and references therein). 

Giving a physical meaning to the scalar A is not trivial. If A is considered as being the 
magnetic field, then V ■ A / 0. Taking A as the vector potential automatically solves the 
non-null divergence problem. However, T.i.ji^l j)^ is then no longer an obvious measure of 
magnetic energy (Charbonneau et al., 2001). Since V x A = B, dimensional analysis does 
suggest |Ap ~ L^|B|^ so we can at first assume that the vector potential scales with the 
magnetic field, a conjecture supported by the numerical results of Isliker, Anastasiadis, and 
Vlahos (2000), using an avalanche model closely related to that described above. Variations 
in lattice energy (Z?/) from one time step to the next during an avalanche then yields the 
energy liberated at each time step: 



The resulting time series of energy release is the target for data assimilation. In the Lu and 
Hamilton discrete model, E,- can be calculated analytically knowing the redistribution rule 
and stability measure, but this will no longer be the case with the continuous analog to be 
introduced shortly; Equation (5) applies equally well to both classes of models. 

At this juncture it must be emphasized that the stochasticity in the Lu and Hamilton 
model (and other published variations thereof) is not a mere noise-like "inconvenience" 
superimposing itself on an underlying deterministic flaring process. In the Lu and Hamil- 
ton SOC avalanche model, stochastic forcing plays the dual role of energy loading and 
avalanche triggering. The model thus has a fundamentally stochastic component. 

2.2. A Continuous SOC Model 

2.2.1. Reverse Engineering 

The structure of Equations (2) and (3) suggests that they can be interpreted as the result 
of applying centered, second-order, finite difference and a one-step, explicit, time-stepping 
algorithm to some partial differential equation (PDE) in two spatial dimension discretized 
on a regular cartesian grid. Applying this reverse engineering approach (see also Liu et al., 
2002) leads to the following PDE describing the evolution of A during an avalanche: 



Likewise, the RHS of Equation (1) is readily interpreted as a second-order centered finite- 
difference representation of the Laplacian operator acting on A, so that the diffusion coeffi- 
cient v(V^A) appearing in Equation (6) is given by: 




lattice avalanching 
otherwise. 



(5) 




(6) 




(7) 
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where is the stability threshold. The numerical value of Va depends on the redistribution 
rules that were used in the discrete equations (Equations (2) and (3)) and on the grid size. 
In the case of the Lu and Hamilton model, in two spatial dimensions, Va = A~/20 {Ax = 
Ay = A), with units of [time]/[Iength]^ implicitly included in the denominator, a value used 
in all calculations reported upon below. In analogy with the discrete model, we identify the 
functional A{x,y, t) with a measure of the magnetic vector potential, and assume that is a 
measure of energy. 

Equation (6) is a fourth-order "hyper-diifusion" equation, albeit a strongly nonlinear one 
since the hyper-diffusion coefficient is only non-zero when the system is avalanching, and 
even then it remains a discontinuous function of position, being only non-zero at unstable 
nodes. If all nodes are stable, then the quantity A evolves in response to the external forcing 
only. In the classical cellular automaton, at each (non-avalanching) time step, a small incre- 
ment of random amplitude in A is added to a single randomly selected node of the lattice. 
Reverse-engineering of this forcing rule immediately leads to: 



where the forcing location (jro,3'o) varies randomly in time. The continuous model then 
amounts to solving Equation (8) when the system is everywhere stable, and switching to 
Equation (6) if one or more nodes become unstable. Note that this implies that the forcing 
{Fr) is turned off during avalanches. This is the same procedure used in the cellular au- 
tomaton, and amounts to assuming that there exists a wide separation of timescales between 
forcing and avalanching, which in fact is well justified in the solar coronal context (see, e.g., 
Luetal, 1993). 

2.2.2. Sample Numerical Results 

We now proceed to solve numerically the continuous-avalanche equation (Equations (6) - 
(8)), by discretizing it using a centered finite-difference scheme with forward Euler differ- 
encing for the time derivative. The reader may rightfully wonder why then did we bother 
reverse-engineering the discrete Lu and Hamilton model in the first place, but the need to 
proceed in this way will become clear presently. Moreover, in solving Equation (6) numeri- 
cally as one would any partial differential equations, subtle differences are introduced with 
respect to the truly discrete model, and these must be clarified. 

A square domain of linear size L = 2;r is partitioned using a regular cartesian grid 
(A.X = Ay = A). Imposing A = on domain boundaries, we solve dimensionless forms of 
the avalanche equations (Equations (6) - (8)), using the hyper-diffusion time scale 



as a time unit. This comsponds to the time it would take an avalanche to sweep across 
the length of the domain. Following a dimensional analysis of the continuous avalanche 
equation, x and y will now be in L units and t in Ty units. The diffusion coefficient v, in 
the non-dimensional equation, is equal to (stable) or to 1 (avalanching). For the scheme 
to remain numerically stable, a von Neumann stability analysis (Press et ah, 1992) indicates 
that the dimensional time step must be chosen such that 



3-=F«(xo(0,3'o(0) 



(8) 




(9) 




(10) 
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0.15 
Time [xj 

Figure 1. Evolution of the lattice energy as a function of time. The lattice energy increases from the null 
state (A = 0) to the SOC state as the energy input by the perturbations is greater than the energy released in 
avalanches. At / ~ 0.23, the SOC state is reached: the energy added to the system is released by avalanches, 
hence the plateau. The top left inset is a cross-section of the lattice variable A along the x-axis, taken at 
different times. The correspondingly labeled solid dots along the lattice energy curve indicate when the 
cross-sections were taken. The bottom right inset is a zoom of the lattice energy curve about point "d", 
and shows the energy variation in the SOC state, with vertical axis covering five energy units. 



At 1 /A^ 



or, in terms of dimensionless time: 



For our working 48 x 48 mesh, 5 (f ) = 4.7 x IQ-^ so a non-dimensional time step of size 
5.5 X 10^'^ safely satisfies that condition, and is used for all simulations unless otherwise 
specified. 

The stability threshold is set at Ac = 7.0. Sequences of uniform random deviates are used 
to define the stochastic forcing. Both the location and magnitude of the perturbation are 
randomly determined, the former excluding boundary nodes and the latter constrained to 
the pre-set interval [— 1 x 10^^ : 4 x 10^^]. The perturbations have non-zero mean to ensure 
buildup of A [x, y) from the initial state A(x,y) = throughout. This is again in direct analogy 
to the Lu and Hamilton model described earlier. Because the continuous model is discretized 
with centered second-order finite differences, boundary conditions for the two rows of grid 
points along the boundary are needed. The grid points along the boundary are set to A = 
(as in the discrete model) while the second row is set to V^A = through the use of the 
fictional-point method (Mitchell and Griffiths, 1980). 

The first task is to bring the system to the SOC state. Starting with A = everywhere, 
the system is driven to the SOC state by the external forcing, interrupted by avalanching 
episodes whenever and wherever the stability threshold is exceeded. Figure 1 illustrates 
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Time [1^,] 

Figure 2. Evolution of the energy released as a function of time. The large avalanches between f ~ 0. 1 and 
f ~ 0. 14 are due to a transient SOC state that is created when the stability criterion of the central region takes 
a value of ~ +Ar while the periphery have ^ —A^. This is only a transitional step as the center will fill up to 
take its final shape of an inverse parabola ("b"^"d" on the inset in Figure 1). The inset is a zoom to show 
that we have individual avalanches separated by calm periods. 

the gradual buildup of lattice energy (l^A^) with time. Initially, the energy gained by the 
system from the perturbations exceeds the energy released in avalanches. Thus, there is a 
net increase of the system's energy. However, at f :^ 0.23 a plateau begins. The system has 
now reached a statistically-stationary state where, in a time-averaged sense, the buildup of 
A in response to forcing is balanced by A evacuated via the A = boundary conditions 
when avalanches reach the system's boundaries. Because large avalanches are more likely 
to reach the boundaries, their frequency of occurrence jumps markedly once the stationary 
state is reached {cf. Figure 2). This stationary state is the SOC state. The bottom right inset 
on Figure 1 shows a zooming on the behaviour of the lattice energy during the SOC state in 
the vicinity of point "d". Every decrease is the signature of an avalanche and the pronounced 
drop starting at ? = 0.2456 corresponds to the large avalanche in the inset of Figure 2. The 
top left inset shows cross-sections along the y-axis of the evolution of the scalar variable A at 
different times up to the SOC regime where it approximates the shape of an inverse parabola. 
A cross-section along the A-axis would yield the same result. All DNS and 4D-VAR runs 
discussed further below are carried out with the system in the SOC state. 

Figure 2 displays the energy released as a function of time. It is only in the SOC state that 
avalanches spanning the whole system can be produced with a significant frequency. For this 
reason, it will be particularly important to ensure that the system is kept in the SOC state 
throughout the data assimilation process. The large avalanches between f ~ 0. 1 and f ~ 0. 14 
(Figure 2) are due to a transient SOC state that is created when the stability criterion of the 
central region takes a value of ~ +Ac while the periphery have ~ —Ac (such as for the curve 
"b" on the top left inset of Figure 1). This is only a transitional step as the center will fill 
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up to eventually take its final shape of an inverse parabola and the system will then be in its 
true SOC state. 

Even though the avalanche model is a continuous one, it retains some discrete features. 
Usually, in the numerical solution of partial differential equations, reducing the time steps 
by a factor of two but using twice as many of these smaller time steps, should not affect the 
outcome significantly. However, this is not the case here, because of the spatio-temporally 
discrete nature of the hyper-diffusion coefficient, as the latter is only turned on when the 
system is unstable. When this happens, the run with the smaller time step (At) will re- 
distribute more finely, as seen in Equation (6). When stability is recovered, the formerly 
avalanching portion of the lattice will find itself closer to the stability threshold, thus the 
lattice energy will increase significantly. The redistribution is also affected by the value of 
the diffusion coefficient (with a dependency on the distance between the grid points) which 
must be lower for higher-resolution grids otherwise too much energy is redistributed and the 
system restabilizes too far below the stability threshold, thus leading to loss of SOC at the 
expense of large, quasi-periodic avalanches. 

The last difference between the Lu and Hamilton and continuous models pertains to the 
effect of an increase in the number of grid points. In the Lu and Hamilton model, the distance 
between each grid point is constant (A = 1) which leads to an increase of the size of the 
domain when the grid point number is increased. As for the continuous model, even though 
the domain size can be increased, we will keep it constant in this paper so an increase in the 
number of grid points will increase the resolution. The system will then be more effective 
in reproducing fine structures as seen in Figure 3, which shows the normalized frequency 
distributions for the stability criterion (Equation (1)) evaluated at each grid point at ten dif- 
ferent times for grid resolution of 24 x 24, 36 x 36, and 48 x 48. As with the discrete model, 
at any given time only a small fraction of nodes are very close to the stability threshold. 
The distribution is broader and centered further below the stability threshold for the higher 
resolution runs. The reason behind this behaviour is that the larger number of degrees of 
freedom in the high-resolution run allows for the presence of finer structures, which make it 
easier for the lattice to exceed the stability threshold, in the sense that fluctuations of AA" j 
about its lattice mean become larger as the spatial resolution is increased. 

2.2.3. Characteristics of Avalanches in the SOC State 

In the SOC state, several characteristics of avalanches have probability distribution functions 
(hereafter PDF) that behave as power laws. This is the case for the avalanche energy (E), 
namely the total energy released by the lattice over the duration of the avalanche; the peak 
energy release (P), corresponding to maximum energy released by the avalanche in a single 
time step; and the duration (T), which is simply the time elapsed from the beginning of 
an avalanche to its end. Figure 4 (panels A - C) show probability distribution functions for 
E, P and T, constructed from runs of different resolutions (24 x 24, 36 x 36, and 48 x 48) 
of 10^ time steps, spanning 1000 hyper-diffusion times, and each including up to 6 x 10^ 
avalanches. Least-squares fit of the form: 

f{x) = fox-" (12) 

are overplotted on the distributions. The fits used the data from the large-resolution runs 
(48 X 48). Because the mesh spacing affects the statistics for small avalanches, the first few 
bins are excluded from these fits. The power law indices obtained were Ue = 1.407 ±0.02, 
ap = 1.800 ±0.04, and ar = 2.067 ±0.03. These first two values are in good agreement 
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Figure 3. Normalized frequency distributions for the stability criterion evaluated at each grid point for runs 
of 24 X 24, 36x36, and 48 x 48. Data taken at ten different times, when the system was stable, were combined 
to generate the PDF. As expected, the peaks of the PDF lie close but below the stability threshold of A^. = 7.0. 
As the resolution increases, the distribution broadens and moves further from the thi'eshold. This is due to 
the increase of fine structures which permits more degrees of freedom to the large-resolution systems, thus 
making it easier to stray away from the threshold. 



with those obtained with the discrete formulation of this avalanche model (see Table II in 
Charbonneau et al. (2001)). The aj value is a bit higher, but is in fact more comparable with 
the values obtained by Lu et al. (1993) (Table 1). Figure 4 (panel D) show the frequency 
distribution of the waiting time (WTD; now on semi-log scale). The waiting time (AT) 
is the time interval during which the system is in a stable state between two consecutive 
avalanches. The WTD is well-fitted with exponential of the form: 

f{At)=foe-P^ , (13) 

here with an inverse e-folding timescale jS = 2158 ±4. The exponential form of the WTD 
comes from the statistical uniformity of the external forcing, here a stationary random pro- 
cess (see Wheatland, 2000). Other types of forcings can be introduced in SOC avalanche 
models for solar flares, producing WTD distributions in better agreement with observations 
(see, e.g., Norman et al., 2001), but for the purposes of the forthcoming validation exercise 
we retain the uniform driver of the original Lu and Hamilton model. 

To sum up, and notwithstanding some subtleties related to mesh refinement, numerical 
solutions of the reverse-engineered avalanche equation lead to avalanching behaviour es- 
sentially identical to that of the original, discrete model of Lu and Hamilton (1991). This 
"validation" of the continuous model may well appear tautological, but it represents an 
essential starting point to our forecasting scheme because recasting the model in continuous 
form is required by the 4D-VAR data assimilation formalism, the topic to which we now 
turn. 



3. Data Assimilation (4D-VAR) 

The underlying idea of data assimilation is to use observed data to introduce corrections to a 
model — often a numerical simulation — of a physical phenomenon, typically with the aim 
of correcting for missing data or to produce forecasts. Generally speaking, data-assimilation 
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Figure 4. Normalized frequency distributions for tlie total avalanche energy (£), the peak energy (P), the 
avalanche duration (7^ and the waiting time (AT), for runs of different spatial resolutions (24 x 24, 36 x 36, 
and 48 x 48) spanning lO' hyper-diffusion times, containing up to 6 x 10^ avalanches. Bins of a constant 
logarithmic width (Alog.v) = 0.2 were used to construct the PDFs. Power-law indices of «£ = 1 .407 ± 0.02, 
Up = 1 .800 ± 0.04, and Ur = 2.067 ±0.03 resulted from fitting data for the 48 x 48 run. An exponential was 
fitted to the waiting-time distribution, yielding an inverse e-folding time /3 = 2158±4. 
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methods can be subdivided into three categories: successive-correction, sequential (Kalman 
filter) and variational methods (Le Dimet and Talagrand, 1986; Daley, 1991; Kalnay, 2003). 
The successive-correction method, also known as Cressman method, consists in the cor- 
rection of a background field, previously obtained through a previous forecast or a trivial 
state due to physical constraints, until it includes the given observations (Cressman, 1959). 
Most sequential methods are based on the Kalman filter, which uses the model error and the 
observational error statistics to find the optimal combination of the model and observational 
data (Kantha and Clayson, 2000). The variational methods consist in finding the space-time 
trajectory of the state variables that will minimize a cost function measuring the discrep- 
ancy between the forecast and the observations (Talagrand and Courtier, 1987; Courtier and 
Talagrand, 1990). The 4D-VAR method belongs to this third class and will be the one used 
in this work. Technically speaking, as our model is in two-dimensional space, our method 
should be called (2-Hl)D-VAR. However, we will continue to refer to it as being 4D-VAR 
for generality purpose and because we are following the philosophy behind the 4D-VAR 
method. 

3.1. 4D-VAR: An Overview 

Four-dimensional variational data assimilation (hereafter 4D-VAR) is an efficient technique 
for incorporating observations in numerical forecasting models (Talagrand and Courtier, 
1987; Courtier and Talagrand, 1990). The 4D-VAR method consists in minimizing a scalar 
cost function measuring the deviation between the forecast and the observations. The phys- 
ical fields produced by data assimilation must correspond to the observations, while abid- 
ing to the known physical laws and/or statistical relations characterizing the system being 
treated (Le Dimet and Talagrand, 1986). 

Figure 5 shows an overview of the 4D-VAR method as applied to a classical forecasting 
problem, namely using the known state of a variable Xj/Q at some initial instant of time, to 
forecast its value xj/j at a later time T. This forecast is to be compared to an actual obser- 
vation V'obs- Here this first forecast falls outside of the observation's error bar. The 4D-VAR 
method uses the difference between the forecast xj/j and the observation i//obs to generate 
a new initial condition \j/Q that now produces an improved forecast at time T which 
is closer to the observation (Errico, 1997). The procedure can be repeated until some pre- 
set goodness-of-fit criterion between and Xf/o^^ is reached. In classical data assimilation 
applications (for example numerical weather forecasting), the point \f/o would corresponds 
to a field variable discretized on a single node (;', j) of a. N x N spatial mesh; 4D-VAR must 
then solve concurrently as many variational problems as there are variables times the number 
of spatial mesh points on which the numerical simulation used to advance the field from 
to T is performed. Data assimilation in numerical simulations is a computationally intensive 
undertaking! 

3.2. The Cost Function 

Generally, in variational problems, we want to minimize the cost function ^ : 



where /{yf.xj) is a scalar function, defined over a domain Q. and a time interval [0, T], of 
the state variable Ijf (Sanders and Katopodes, 2000). More precisely, in data assimilation, 




(14) 
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T 
Time 

Figure 5. Overview of vaiiational data assimilation. Point i^o is an estimate of the initial condition at time 
0. Using this initial condition, a forecast 1/7 is obtained at time T. The 4D-VAR method uses the difference 
between the forecast 1/7- and the observation i^oj,, to generate a new initial condition i/q which will produce 
a better forecast (1/^) at time T. 



we want to minimize the error between the forecast and the observations {ijfj — V^gbs) 
(c/ Figure 5): 



Jn 



{¥t - V^obs)^W(v^r - V^obs) dx dt, (15) 



where W is a matrix of statistical weights given by the instrumental errors in the obser- 
vations and T indicates the transpose. The squared residual {tjfj — \|fg^,^)^ is used instead 
of absolute values, to avoid introducing discontinuities when the cost function will be dif- 
ferentiated (next section). The physical equations (i.e. the continuous avalanche equation 
(Equations (6) -(8))), which can be schematically written as: 

<^{\ir,x,t) = 0, (16) 

are acting as constraints during the minimization (Talagrand and Courtier, 1987). 

In this paper, the cost function will use the time series of released energy as defined via 
Equation (5) as the state variable Xjf. Via the use of Equation (4), we get: 

d 



0- 



If we bin the time series of the released energy Er (Section 4.1.2), we have: 

A/? 
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where is the number of elements per bin. The cost function is then: 

/ = \ [^E,-Ef''fAt . (19) 



The identity matrix is used here for the observational error matrix W, because synthetically- 
generated observations will be used (Section 4) in what follows. Covariance error matrices 
are certainly an important and delicate point when dealing with real data. 

3.3. The Lagrangian Formulation 

We want to minimize the cost function ^ given the constraint S{^,'K,t) = 0. Since this is 
a problem of minimization with constraints, a Lagrangian formulation is used: 

^{\lf,X) = J{yif) + £j^X{x,t)-^{\lf,x,t)dxdt, (20) 

where X{x,t) are the Lagrange undetermined multipliers, also called adjoint variables 
(Sanders and Katopodes, 1999). The variational operator 5 is then applied on the Lagrangian 
to find its stationary points: 

One can use integration by parts to transfer the differential operators from the state variable 
yfto the adjoint variable A. For an arbitrary displacement 5A), the minimum is reached 
only when 5^ = (Daley, 1991). This indicates that the derivatives of the Lagrangian with 
respect to each direction must be zero: 

— =^{xif,x,t)=0, (22) 

and 

§|^Adj(A) + §^=0, (23) 

where Adj(A) represents the adjoint equations (Schroter, Seller, and Wenzel, 1993). 

As noted by Le Dimet and Talagrand (1986), this set of equations (Equations (22) and 
(23)) are the Euler-Lagrange equations. 

For the 2D avalanche model, the direct and adjoint equations are: 



3A ^) -|^(v(V2a)0)-^(v(V2a)0) avalanching 
^ ' FK{xo{t),yo{t)) stable 



(24) 



' V(V^A)3-^ - 33 v(V2a)3-^ - (25) 



and 



dx dx^ \ dx~ J dy^ \ dy^ J dA 

respectively, where the generic variable Ijf has been replaced by the variable A defined over 
the lattice, the adjoint variable A has been renamed A* as it is the adjoint variable associated 
with A, and T is a reverse time (z = T — t). 
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Evaluating the term is particularly delicate because the cost function ^ is not 
directly given in terms of a spatial quantity related to A, but as a time series of a nonlinear, 
non-local function of that quantity: 



1 d{Er-E\ 



obs\2 



2 

{Er-E: 



dA 



- dt 
dt 



dA 



dt . 

Jo Ab dA 
Using the theorem of implicit functions to rewrite the derivative gives: 

dZhinEr dZhinEr / dA 



dA 
^ {E,-Ef^)dZ^^E, 



dA 



dt 



dt 



(26) 



(27) 



provided ^ ^ so the term is only to be evaluated when the system is avalanching. 
Substituting Equation (27) in Equation (26): 



dA 



{E,.-E-^^)dZunEr 



Ab 
Ab 



dt 

^ dEr 

bin 



'dA 
dt 



dt 



'dA 
dt 



dt 



If we use the definition of E,- (Equation (17)), we get: 



d_^ 
dA 



Ab 



obs \ 



EE 

bin \tj 



dt^ 



'dA 
'dt 



dt 



(28) 



(29) 



The initial and boundary conditions for the adjoint equation arise from the integration 
by parts, namely the terms evaluated at the limits of the integrals. The initial conditions are 
A*|x=o = ™d the boundary conditions are: 



A*(0,>.,T) 
A*(x,0,T) 
dA* 

d^A* 
dx^ 



A*{L,,y,T) = Q 
A*(x,Lv,t) =0 
dA 



y=0 



dA* 
dx 

d^A* 



dx^ 



.1=0 



.v=L,- 



dy 
d^A* 
dy^ ,=0 
dA* 
dy 

d-A* 







= 



dy^ 



(30) 



>'=Lv 



Unfortunately, there is no efficient method to directly solve the Euler-Lagrange equations 
(Equations (22) and (23)); therefore, we must formulate the problem as an unconstrained 
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problem (Talagrand and Courtier, 1987). Although we have a random driver in our model, 
the model can still be regarded as being deterministic because the same (arbitrary) sequence 
of random numbers is being used throughout the minimization process within 4D-VAR. 
Therefore, the initial conditions, more precisely the "connected" portion of the system close 
to the avalanching threshold, will dictate the evolution of the system with time. This high- 
lights the fact that the cost function is an implicit function of the initial conditions: it is by 
varying the initial conditions that we will find the solution of the physical equations which 
minimizes the cost function (Ehrendorfer, 1992). In the language of optimal control theory, 
the initial conditions are the control variable in this problem (Lions, 1968). 

As most minimization algorithms requires the gradient of the function to be minimized, 
we need the gradient of the cost function with respect to the initial conditions. However, it is 
not possible to calculate this gradient analytically as the cost function is an explicit function 
of the final conditions (i.e. forecast). It turns out that the more efficient way to calculate 
the gradient of the cost function with respect to the initial conditions is to use the adjoint 
equations evaluated at t = T (Courtier and Talagrand, 1990): 



which then requires numerical integration of the adjoint equations from T = to T = T, i.e., 
backwards in time from t = T tot = 0. This is the reason behind the interest and use of the 
adjoint equations, and the need for a continuous form of the avalanche model. 

3.4. Minimization Algorithm 

The minimization of the cost function is usually carried out via a minimization algorithm 
such as steepest descent, conjugate gradient, or quasi-Newton methods. The steepest descent 
is a simple method but it converges linearly. In this study, the conjugate gradient is used 
because of its quadratic convergence. The quasi-Newton method also converges quadrat- 
ically and is popular among meteorologists. However, it requires the computation of the 
Hessian matrix. Even if an approximation of the Hessian is normally used, convergence 
problems may arise if it becomes nearly singular (Press et al., 1992). One can solve these 
kind of problem but this leads to an algorithm of a greater complexity, and it is usually more 
computationally intensive. 

The algorithm implementation of 4D-VAR data assimilation runs as shown on Figure 6. 
Starting from initial conditions obtained by current experimental observations or a previous 
numerical simulation, a direct simulation generates a traditional (DNS) forecast. After read- 
ing the observations taken at the end of the forecast period, the cost function is evaluated, 
followed by the evaluation of its gradient. If the gradient is smaller than a chosen tolerance 
which accounts for error due to numerical precision, then the minimum of the cost function 
has been reached and the optimal 4D-VAR forecast is obtained. If the cost function is not 
minimized, we must iterate. A new set of initial conditions are generated and used as the 
starting point of a new forecast. The cost function and its gradient are reevaluated and 
checked again against the termination criterion. This procedure is repeated until the latter is 
met. The iteration loop in Figure 6 takes place within the conjugate gradient minimization 
algorithm. It is also the conjugate gradient which modifies the initial conditions A^^ at each 



^/A'^=A*{x,y,T = T) 



(31) 



iteration: 




(32) 
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Read initial conditions 



DNS simulation 



Output: DNS forecast 



Read observations 



Evaluate cost function <r 



Output: 
optimal (4D-VAR) forecast 



No 
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New initial conditions 






DNS simulation 
New forecast 









Figure 6. Algorithm of 4D-VAR data assimilation. From initial conditions, a traditional (DNS) forecast is 
made. Then the observations are read. The value of the cost function's gradient indicates if minimization was 
achieved. If not, a new set of initial conditions are produced by backwards integration of the adjoint equations, 
a new forecast is produced, and the verification is repeated. This procedure is iterated until minimization of 
the cost function is achieved. The output is the optimal 4D-VAR forecast and associated initial conditions. 




where, for the k iteration of the conjugate gradient, p^j is the conjugate-direction vector 
multiplied by an amplitude a*. These conjugate direction vectors are obtained with the use 
of the gradient: 



P 



ij 



T+^PiJ (33) 



where 

' r^ ^Ic %-r 



ir^k . (34) 



and the gradients are evaluated using Equation (31). The amplitude a*is calculated with: 

„ , 1 (b-anf[b) - f(c)\ - (h-cf\J(h) -/{a)] 

2 {b-a)[/{b)-/{c)]-{b-c)[/{b)-/{a)] ^ ^ 
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where a, b, and c are modified initial conditions that where obtained through a line mini- 
mization method known as inverse parabolic interpolation, which iteratively finds a triplet 
of points such that the minimum of a parabola passing through these three points will be as 
close as possible to the minimum of the function in that given interval (Press et al, 1992). 

3.5. Beyond Classical 4D-VAR 

There are many ways in which our use of 4D-VAR goes beyond the "classical" formulation 
of 4D-VAR, as given on Figure 5. We are using 4D-VAR to assimilate data into a cellular 
automaton made continuous only for the purpose of writing adjoint equations. Grid and 
time steps are fractions of the characteristic scales of the hyper-diffusive processes involved 
and are not infinitesimals. However, as pointed out by Isliker, Anastasiadis, and Vlahos 
(2000), it is still possible to compute derivatives and thus operators. To the best of our 
knowledge, at the present time the only other area in which data assimilation techniques are 
being developed in conjunction with a cellular automaton is in seismic data assimilation of 
a stochastic random fault model (Rundle et al, 2003; Gonzalez et al, 2006). 

We are also using a time series of a global, model-produced variable to define the error, 
as opposed to the spatial state of the system measured at some time interval beyond the 
initial condition in which data is being assimilated. Assimilating time series of a global 
variable instead of (or together with) spatial states of a system at non-zero times t is truly 
compatible with the 4D-VAR approach, as opposed to 3D-VAR for instance. Models in 
environmental sciences have been, and will increasingly be, assimilating time-series of data 
(see for instance Carton, Chepurin, and Cao (2000) in Oceanology, Bertino, Evensen, and 
Wackemagel (2002) in Estuary modeling, or Eymin and Foumier (2005) in Geomagnetism). 

Finally, our model system includes an essential stochastic component, namely the driv- 
ing. Ocean and atmosphere general circulation or solid-Earth dynamics are all random 
driven (noisy) systems. Data assimilation, including 4D-VAR, in principle can be applied 
to any such system (e.g., Nichols, 2003; Moore et ah, 2004). Random driving, inaccessible 
to observations, cannot be assimilated, but is not seen as a systematic bias in the model (see, 
e.g., Anderson, 2003). 



4. Validation Experiments 

4. 1 . Experimental Design 
4.1.1. Synthetic Data 

The validation experiments use synthetic observations generated from the same SOC ava- 
lanche model used to carry out data assimilation (certainly an optimal situation from the 
data assimilation point of view). Of course, entirely independent realizations of the stochas- 
tic driving are used to generate the synthetic observations, and these realizations are not 
made available to 4D-VAR, as our challenge is precisely to see whether 4D-VAR can still 
adequately assimilate the observations without "knowing" about the stochastic driving. 

The synthetic energy-release time series cover one hyper-diffusion time, and were pro- 
duced on a 48 X 48 grid, with time step At/Xy = 5.5 x 10^", and forcing parameters as 
given in Section 2.2.2. 
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4.1.2. Thresholding and Binning 

Figure 7 illustrates a segment of representative simulation run. The gray line in the main 
panel of Figure 7 is a typical time series for the energy released by the avalanches. Time is 
measured in units of the magnetic hyper-diffusion time scale, and energy in arbitrary units. 
This arbitrary energy scale can be mapped to the standard flare classification by dividing 
the peak energy that covers three decades (panel B of Figure 4), in four ranges equal in 
logarithmic size. Thus, the following association can be made: P < 8 are B-class flares, 
8 < P < 40 are C-class flares, 40 < P < 200 are M-class flares, and P > 200 are X-class 
flares. 

Data assimilation is carried out on a binned version of the energy time series, shown as a 
thick solid line in Figure 7. A bin width of Ab = 100 (i.e. 100 time steps) was chosen, as it is 
large enough to remove the small structures but small enough to keep the general features of 
the avalanche. This binning facilitates the minimization of the cost function by eliminating 
the fine structure details, which are unnecessary as we are mostly interested in forecasting 
the flaring time, peak flux, and total released energy. 

Two parameters are used to build statistics. First, there is an energy threshold (horizontal 
dashed lines in Figure 7). Only avalanches with an energy above the threshold are considered 
in the statistics. The signals below the threshold are treated as being noise or low energy 
avalanches that are unimportant. Here the two thresholds that will be used in this paper 
are shown: a threshold at 90 and a larger one at a value of 200 energy units. The lower 
threshold retains the model's equivalent of upper half of the medium size (M-class) and 
large size (X-class) flares while the larger one will only take the high energy X-flares into 
account. These classes of flares are the ones having the most important effects on Earth. The 
second parameter is the forecast window. A forecast window of a range of 5t = 0.55 x 10^^ 
is depicted in the upper left corner of the main panel of Figure 7. The forecast window 
is the maximum time interval between an observed peak and a forecasted peak to have a 
match. In the case of a forecast window of 5t = 0.55 x 10^^, the predicted peak can either 
be 5t — 0.275 x 10^^ before or after the observed peak. Hence, a smaller forecast window 
implies a more precise forecast. 

The top four panels of Figure 7 show the evolution of the avalanching regions at different 
stages throughout the spatio-temporal evolution of a large avalanche. Starting from an initial 
perturbation at a given random point, the "preflare" stage begins with an avalanching region 
occupying a small region (panel A, Figure 7) of the domain that will gradually increase 
(panel B, Figure 7) to reach a large size, occupying an important portion of the domain, 
at the "impulsive stage" at the end of which the peak is reached (panel C, Figure 7). At 
this point, the fragmented hyper-diffusion front has reached the sides of the domain. In 
the "decay" stage, the avalanching regions start to decrease in size (panel D, Figure 7). 
Eventually, the system will return to a stable state, and driving resimies until the triggering 
of the next flare/avalanche. 

4.1.3. Maintaining the SOC State 

To prevent the system from leaving the SOC state due to the 4D-VAR correction made to 
the initial conditions, the lattice energy of the corrected initial conditions is compared to 
the average lattice energy, £/, of the SOC state. If the lattice energy of the corrected initial 
conditions is not found within the variance a, the 4D-VAR correction is adjusted by a factor 
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Figure 7. A small segment of a run of the avalanche model. Several aspects of the experimental design are 
displayed in the bottom panel. The gray line is an original time series of released energy generated by the 
avalanche model (Equation (6)). The thick solid line is the same series which has been binned by using bins 
of width of AZ) = 100 time steps. Only the avalanches above the energy threshold (horizontal dashed lines) 
will be considered. In the top left comer of the main panel, a typical forecast window is shown. It determines 
the maximum time interval between an observed and forecasted avalanche in order to have a match. The 
top four panels show the avalanching regions evolving in space and time. From the initial perturbation at 
a random point, the avalanching region increases from a small region (A) to a large region occupying an 
important portion of the domain where the hyper-diffusion has reached the sides of the domain (B and C). 
The avalanche is typically fragmented in its decay phase (D). 



e defined as: 



\E,-E,\ 



\Ei-E,\ 



if El > El + a 

if Ei-a <Ei <Ei + a 

if El <Ei-a 



(36) 



which brings the initial conditions' lattice energy within the variance. 
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4.1.4. Random Numbe r Sequences 

For the minimization procedure within the 4D-VAR algorithm, the same seed is used to 
initialize the random number generator, so that during each iteration within 4D-VAR (c/ Fig- 
ure 6), the same sequence of perturbations is added in the same order to the same mesh 
points. 

4.1.5. Running 4D-VAR 

With a realization of stochastic driving different from the one that has been used to produce 
the synthetic observations (panel A, Figure 8), a DNS "forecast" is produced (panel B, 
Figure 8). The same initial conditions were used in both cases, but the different realizations 
of the driving have led, perhaps not surprisingly, to very different time series of energy 
release. Although the DNS run has reproduced the small avalanche at ? = 1.1 x 10^^, it 
missed the large one at r = 3.5 x 10^^. The 4D-VAR run (panel C, Figure 8), using the 
same driving realization as the DNS forecast but with corrected initial conditions, has cor- 
rectly reproduced the large avalanche. Figure 8 is a typical case when the 4D-VAR method 
performs well. There are cases where the DNS is already quite good, and 4D-VAR cannot 
produce significant improvement; this is in fact expected, and moreover is the reason why 
true forecasting may be possible despite the stochastic nature of the forcing (more on this in 
the concluding section). In this representative sample run, minimization of the cost function 
by 4D-VAR was achieved in a mere five iterations of the conjugate gradient. This 4D-VAR 
procedure can thus be summarized as follows: 

• Production of synthetic observations with a given sequence of random numbers 

• A DNS run using a different sequence of random numbers is performed 

• A 4D-VAR run with the same sequence of random numbers as used for the DNS run 
is performed 

• As the 4D-VAR method benefits from corrected initial conditions, it is more successful 
than the DNS run in reproducing the observations 

4.2. Performance 

4.2.1. Performance Measurements 

In anticipation of true forecasts, it is instructive to analyze the performance of the 4D-VAR 
runs in terms of matches, misses, and false alarms. Only the avalanches with a peak above 
the energy threshold are considered. With the forecast window centered at the peak of each 
observed avalanche, the forecast is examined to see if one of its avalanches takes place 
inside the window. If this is the case, we have a match. If we use an energy threshold of 200 
and a forecast window of 5t = 0.55 x 10^^, Figure 8 has two matches: the large avalanche 
at f = 3.4 X 10^^ and a smaller one at f = I.l x 10"^. The avalanche at ? = l.I x 10"^ is 
a possible match for either one of the avalanches at f = l.I x 10^^ and t = 1.38 x 10^^. 
Such situations are treated as a single match. Hence, a match can be considered as an event 
happening in a time interval, determined by the forecast window, regardless if there is a 
single avalanche or multiple consecutive avalanches. A miss happens when the observed 
avalanche does not have a counterpart in the 4D-VAR run. Figure 8 (panel C) has two misses: 
att = 2.9 x 10^^ and t = 4.3 x 10^^. False alarms are avalanches appearing in the 4D-VAR 
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Figure 8. A sample run of 4D-VAR data assimilation. The synthetic observations being assimilated are in 
panel A. Even using the same initial condition A{x,y,t = 0), a single DNS "forecast" (panel B) often produces 
poor results, a direct consequence of the stochastic nature of the driving process. Retaining the same driving 
but allowing 4D-VAR to alter the initial condition (panel C) results in a much better representation of the 
observations. 



run which do not have counterparts in the observations. Figure 8 (panel C) has one false 
alarm att= 1.7 x 10^^. The avalanche at ? = 0.1 x 10^^ is not considered as being a false 
alarm as it is an artifact of the 4D-VAR method. The correction to the initial conditions 
has left the system in an unstable state so the forecast time series began with an avalanche. 
Finally, the two small avalanches at f = 0.6 x 10^^ and r = 2 x 10^^ in the observations are 
not included in the analysis as they fall below the energy threshold. 

The final value of the cost function is not necessarily an optimal measure of a successful 
run, as it simply measures the mean quadratic difference between the observation and model 
time series over the whole duration of the assimilation interval. In the flare-forecasting con- 
text, one is primarily interested in predicting the timing of discrete events, namely the largest 
flares/avalanches, and ideally also a measure of their peak flux and/or total released energy. 
Consequently, we define the following quality factor (g) to assess numerically whether a 
given run was successful or not in "catching" avalanches in the observations: 



E„ 



match ^" missV^'o'/ false alarm V / 

In Equation (37), the first term is the sum, for all pair of matching flares, of the inverse of the 
error difference between the total energy of the observed (Eg) and modeled (Ef) avalanches 
multiplied by a factor a. The two next terms are penalties due to the misses and false alarms. 
The penalty are defined as the energy of the missed (Eg) and false alarm (Ef) avalanches 
normalized by the total energy released by all the observed avalanches {E,o, ). These three 
individual contributions to Q are then each assigned a distinct weighting factor, chosen here 
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Table 1. Perfomiance of the system in matching n consecutive avalanches before the first miss. The results 
from the regular DNS runs (numbers in square brackets) were added for comparison purpose. The second 
and third columns lists the number of runs that had a match or a miss for the ji* avalanche, respectively. 
Fourth and fifth columns displays the number of false alarms between the and {n — 1)"' avalanches and 
the corresponding numbers of runs which had these false alarms. The sixth column keeps track of the runs 
that have neither a match nor a miss because there ai'e no longer any avalanches above the threshold. 
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as a = 4, /3 = 2, and 7= 1, so that the largest weight is for the matches. Note also that under 
these definitions, missing a large avalanche incurs a larger penalty than missing a small one. 
The misses have a larger weight than the false alarms because a miss leaves us unprepared 
to handle the consequence of the flare. On the other hand, in a false alarm we may incur 
additional costs even though no flare is triggered. 

4.2.2. Performance Statistics 

Statistics of hits, misses, and false alarms has been gathered for 100 4D-VAR runs. These 
runs have been realized with combinations of ten sets of observations and ten sets of distinct 
random number sequences each setting a distinct realization of stochastic driving within the 
SOC model. For all runs included in the statistical analysis to follow, an energy threshold of 
90 and a forecast window of 5t = 0.55 x 10^^ were used. 

The idea here is to investigate the performance of the system over the assimilation inter- 
val (we are still not forecasting at this stage), in terms of the number of consecutive matches 
that could be obtained before the first miss. The results are tabulated in Table 1 for each 
avalanche (first column). The second and third columns lists the number of runs that had a 
match or a miss for the n* avalanche, respectively. The fourth and fifth columns displays 
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the number of false alarms between the «* and (« — 1)* avalanches and the number of 
runs which had these false alarms. The last column accounts for the fact that the runs do 
not have the same number of avalanches above the threshold. The results of applying the 
same analysis to regular DNS runs are added for comparison purpose (numbers in square 
brackets). 

Examining the first row, we see that, of the 100 runs, 65 4D-VAR runs succeeded in 
reproducing the first avalanche (above threshold) while 35 runs missed it. Of the 65 runs 
with the match, 44 had no false alarms between the beginning of the run to the first avalanche 
while 21 of them had between one and four false alarms in this time interval. The DNS runs 
were less successful in reproducing the first avalanche as 41 of them were able to do so. The 
proportion of false alarms before the first avalanche is similar for both the 4D-VAR and the 
DNS runs. Every run, 4D-VAR and DNS, had at least one avalanche above the threshold. 
This first result is quite interesting: 44% of the 4D-VAR runs successfully reproduced the 
first observed avalanche without making a false alarm; this is a 60% increase in performance 
compared to the DNS runs. If we move on to the second row, of the 65 4D-VAR runs which 
reproduced the first avalanche, 27 of them were also able to reproduce the second avalanche. 
The remaining 38 runs either missed the second avalanche (28 runs) or did not had a second 
avalanche higher than the threshold (10 runs). The number of false alarms between the first 
and second avalanches follows the same trends as the ones before the first avalanche. The 
difference between the 4D-VAR and DNS runs is less pronounced as 22 DNS runs matched 
the second avalanche. This implies that the 4D-VAR method is very good at reproducing 
the first avalanches but afterward the performance degrades to become equivalent to the 
DNS method. However, the 4D-VAR method did reproduce runs of five and six consecutive 
avalanche when the DNS runs no longer produced avalanches above the energy threshold. 
Finally, the number of consecutive matches continues to decrease until either an avalanche 
is missed or all observed avalanches are reproduced. 

The reason behind this somewhat sudden decrease in performance comes from the fact 
that avalanches, especially large ones, have a deep impact on the energy distribution on the 
lattice. Thus, even with synthetic data produced by the same model used for data assimila- 
tion, it become more difficult to reproduce the next avalanche, which explains the constant 
decrease in the number of matches. This is a direct consequence of the stochastic nature of 
the driving process, and already heralds the finite forecasting window that can be expected 
when operating in true-forecasting mode. Nevertheless, one tenth of the total 100 runs could 
still reproduce the first three avalanches. Only two runs reproduced the first six avalanches 
above threshold (although 61 runs did not have a sixth avalanche to reproduce). However, 
at this point, the size of the time interval for the run (r = 5.5 x 10^^) is felt as nine runs that 
match the first two runs did not have a third avalanche above the threshold. The runs used 
here have an average of four avalanches above the energy threshold. 

4.2.3. Code Performance 

At a spatial resolution of 48 x 48, it takes about five minutes of wallclock time to complete 
a data assimilation run over 10 000 time steps on an Intel Itanium 2 processor. Note that no 
particular efforts were made towards code optimization. Although the writing of the adjoint 
equations in the 4D-VAR implementation can be difficult, the resulting data assimilation 
scheme is quite fast. The hope is that this performance will not degrade too much once 
real data, including observational errors and covariance matrices, will be used for true flare 
forecasting. 
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5. Conclusion: Towards Forecasting 

What have we actually achieved with this whole data assimilation procedure? Let us go 
back to the idea of predicting flare occumnce and energy release via a direct numerical 
simulation based on a SOC avalanche model. To run such a model forward in time, two 
things must be specified: i) the current state of the lattice at time to, to be used as initial 
condition for the forecast, and ii) the spatio-temporal sequence of perturbations throughout 
the forecasting interval. If the latter are truly stochastic in nature, they remain completely 
unknown at to. In the context of Parker's nano-flare hypothesis, which provides the physical 
underpinning of avalanche models in the present situation, these perturbations amount to 
small kinks between adjacent magnetic fieldlines, building up in response to slow forcing 
of the structure's photospheric magnetic footpoints. Not only do these kinks develop in 
response to stochastic forcing, but they also occur on spatial scales inaccessible to direct 
observation. This means that flare forecasting using an avalanche model will always retain 
a stochastic component. 

What we have shown in this paper is that past avalanching behaviour can be reproduced 
reasonably well using data assimilation, even without detailed knowledge of the stochastic 
forcing. At the end of the assimilation interval, the lattice is in a state that is compatible 
with (and determined by) past flaring behaviour. This then represents the optimal initial 
condition from which to carry out a DNS forecast. This, of course, does not guarantee that 
any given DNS forecast will be accurate, but that an ensemble of DNS forecasts will show 
avalanching patterns that reflect, at least in part, the state of the lattice at fo- In particular, 
if this initial condition is characterized by a large, connected portion of the lattice close to 
the stability threshold, then one would expect that a large avalanche is likely in the near 
future, irrespective of the spatio-temporal details of the forcing. It should then be possible 
to forecast with some accuracy the largest upcoming avalanches using statistical ensembles 
of DNS runs. Small avalanches, on the other hand, will depend more sensitively on details 
of the (stochastic) forcing. In such cases, even ensembles of DNS runs are less likely to 
produce useful forecasts. In the space-weather context, this is not too problematic, since it 
is precisely the largest flares/avalanches for which one is seeking accurate forecasts. These 
expectation are examined in detail in the following paper in this series (Belanger, Vincent, 
and Charbonneau, in preparation). 

SOC avalanche models are certainly not the only modeling framework for solar flares 
within which data assimilation can be carried out. A good case in point is the CISM project 
(Wiltberger and Baker, 2006; Siscoe and Solomon, 2006), an ambitious sun-to-ionosphere 
data assimilation framework based on a suite of coupled 3D MHD models. The attractive 
feature of SOC models — arguably their single most attractive feature — is that, by all 
appearance, they correctly capture the global statistical behaviour of energy release by solar 
flares, including in particular its power-law form and associated exponent. This makes such 
models ideal candidates for data-assimilation-based forecasting, despite their extreme phys- 
ical simplicity and the inevitable stochastic effects associated with the driving mechanism 
responsible for energy injection into the overlying coronal magnetic structures. 

We note, in closing, that within Parker's physical picture of coronal structures being 
forced by photospheric fluid motions, such stochastic effects would also need to be incor- 
porated into any full-scale MHD models of coronal structures to be used for flare fore- 
casting. Data assimilation could help here as well (see, e.g., Schrijver and DeRosa, 2003), 
but spatially- and/or temporally-unresolved fluid motions would again introduce a form of 
stochastic "noise" in the MHD simulations, with inevitable degradation of forecasting per- 
formance even if such models would be based on true physical equations rather than some 
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largely ad hoc cellular automaton. The latter, however, is such a simpler model to simulate 
that it becomes possible in practice to carry out ensemble DNS forecasting in reasonable 
wallclock time even on mid-range computational platforms. This is an essential requirement 
of operational forecasting. 
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